Goto

Collaborating Authors

 event decomposition


Q2E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval

Dipta, Shubhashis Roy, Ferraro, Francis

arXiv.org Artificial Intelligence

Recent approaches have shown impressive proficiency in extracting and leveraging parametric knowledge from Large-Language Models (LLMs) and Vision-Language Models (VLMs). In this work, we consider how we can improve the identification and retrieval of videos related to complex real-world events by automatically extracting latent parametric knowledge about those events. We present Q2E: a Query-to-Event decomposition method for zero-shot multilingual text-to-video retrieval, adaptable across datasets, domains, LLMs, or VLMs. Our approach demonstrates that we can enhance the understanding of otherwise overly simplified human queries by decomposing the query using the knowledge embedded in LLMs and VLMs. We additionally show how to apply our approach to both visual and speech-based inputs. To combine this varied multimodal knowledge, we adopt entropy-based fusion scoring for zero-shot fusion. Through evaluations on two diverse datasets and multiple retrieval metrics, we demonstrate that Q2E outperforms several state-of-the-art baselines. Our evaluation also shows that integrating audio information can significantly improve text-to-video retrieval. We have released code and data for future research.


Great apes may have cognitive foundations for language

Popular Science

You see a cat chasing a mouse. You probably don't realize it, but as soon as you catch sight of this scene unfolding, your brain makes a key distinction between the cat and the mouse: It identifies who's chasing, and who's being chased. This capacity to distinguish between the "agent" (the entity performing an action) and the "patient" (the entity upon which that action is being performed) is called "event decomposition," and it's long been thought that it was unique to humans. However, a new study published in PLOS Biology on November 26 suggests that this is not the case: great apes (specifically gorillas, chimpanzees, and orangutans) also seem to track events in the way that we do, distinguishing between agent and patient. This finding is notable because scientists believe event decomposition lies at the heart of something that is unique to humans.